ftp.cs.arizona.edu

home *** CD-ROM | disk | FTP | other *** search

/ ftp.cs.arizona.edu / ftp.cs.arizona.edu.tar / ftp.cs.arizona.edu / icon / newsgrp / group93a.txt / 000003_icon-group-sender _Sun Jan 3 21:17:57 1993.msg < prev next >

Wrap

Internet Message Format | 1993-04-21 | 34KB

Received: by cheltenham.cs.arizona.edu; Mon, 4 Jan 1993 04:34:12 MST Date: 3 Jan 93 21:17:57 GMT From: agate!spool.mu.edu!olivea!pagesat!spssig.spss.com!uchinews!ellis!goer@ucbvax.Berkeley.EDU (Richard L. Goerwitz) Organization: University of Chicago Subject: parser generator, part 1 Message-Id: <1993Jan3.211757.28395@midway.uchicago.edu> Sender: icon-group-request@cs.arizona.edu To: icon-group@cs.arizona.edu Status: R Errors-To: icon-group-errors@cs.arizona.edu Warning - Don't bother unpacking this file unless a) you know what a parser generator is, and b) you are interested in thinking about how one should look for Icon. -Richard Goerwitz ---- Cut Here and feed the following to sh ---- #!/bin/sh # This is a shell archive (produced by shar 3.49) # To extract the files from this archive, save it to a file, remove # everything above the "!/bin/sh" line above, and type "sh file_name". # # made 01/03/1993 20:21 UTC by richard@zenu # Source directory /u/richard/Ibpag # # existing files will NOT be overwritten unless -c is specified # This format requires very little intelligence at unshar time. # "if test", "cat", "rm", "echo", "true", and "sed" may be needed. # # This is part 1 of a multipart archive # do not concatenate these parts, unpack them in order with /bin/sh # # This shar contains: # length mode name # ------ ---------- ------------------------------------------ # 41902 -rw-r--r-- ibpag.icn # 18563 -r--r--r-- maketbls.icn # 9627 -r--r--r-- preproc.icn # 25415 -r--r--r-- itokens.icn # 3663 -r--r--r-- debugme.icn # 1190 -r--r--r-- errors.icn # 2062 -r--r--r-- slashupto.icn # 4314 -r--r--r-- rewrap.icn # 762 -r--r--r-- strip.icn # 1752 -rw-r--r-- Makefile.dist # if test -r _shar_seq_.tmp; then echo 'Must unpack archives in sequence!' echo Please unpack part `cat _shar_seq_.tmp` next exit 1 fi # ============= ibpag.icn ============== if test -f 'ibpag.icn' -a X"$1" != X"-c"; then echo 'x - skipping ibpag.icn (File already exists)' rm -f _shar_wnt_.tmp else > _shar_wnt_.tmp echo 'x - extracting ibpag.icn (Text)' sed 's/^X//' << 'SHAR_EOF' > 'ibpag.icn' && X############################################################################ X# X# Name: ibpag.src X# X# Title: Icon-based parser generator X# X# Author: Richard L. Goerwitz X# X# Version: 1.21 X# X############################################################################ X# X# X# General Description: X# X# IBPAG is a simple tool for generating parsers from grammar X# specifications. Sounds pretty mundane, but those who have never X# used a parser generator will perhaps be surprised to find that this X# kind of tool forms the basis of most modern programming language X# implementations. Parser generators are also used in preprocessors, X# transducers, compilers, interpreters, calculators and in fact for X# just about any situation where some form of structured input needs X# to be converted into some form of structured output. X# X# IBPAG is not a full, working system. In particular, startup X# times for compiled IBPAG programs is very slow, since I can't find X# a good way of writing/reconstructing the necessary structures. For X# applications where a startup delay is not a problem, IBPAG will X# work fine. What IBPAG is supposed to be is a very preliminary X# crack at designing a parser generator for Icon. It really ought to X# be redone in C some day, and its error handling facilities need to X# be tested better. It also ought to be redone as an LALR (and not X# LR) parser, and some method of storing parse tables ought to be X# found that permits fast/easy reconstruction at run-time. This X# version has to be regarded as in beta testing. I doubt I'll ever X# take it, however, much further - at least in its present form. X# X# X# More technical description: X# X# Technically speaking, IBPAG is a preprocessor that accepts a X# Icon-like source file from the standard input containing grammar X# productions and actions, converts them into parsing tables and X# associated code, and then adds to this an LR parser, tokenizer, X# error handler, and a few debugging tools. Writes the combination X# to the standard output, along with the necessary action and goto X# tables. The tables are generated using a very general (but also X# very sluggish) LR(1) algorithm. IBPAG is not terribly practical X# for most micros. It was written as an excercize by me (a linguist X# with far too little CS training) while auditing a compiler course X# as a some-time diversion from many dissertation rewrites... X# X# Cycles and epsilon moves are handled correctly (to my X# knowledge). Shift-reduce conflicts are handled in the normal way X# (i.e. pick the rule with the highest priority, and, in cases where X# the priority is the same, check the associativities; flag X# reduce/reduce conflicts as errors, and for shift/reduce conflicts X# pick shift by default [shift/shift conflicts don't matter, since we X# shift in either case, and no reduction takes place]). X# X# X# Running IBPAG: X# X# Invoking IBPAG is very, very simple. There are just two X# (optional) command-line switches: X# X# ibpag [-d] [-v] < inputfile > outputfile X# X# Where the -d switch causes profuse debugging messages to be X# displayed, and the -v switch ("verbose") causes somewhat more X# restrained messages to be emitted. -d implies -v. Inputfile is an X# IBPAG source file (see below on its format). Outputfile is an LR X# parser with appropriate tables, generated from the specs contained X# in inputfile. X# X# For more information how to set up IBPAG files and incorporate X# them into larger Icon programs, see the section just below, "How to X# use IBPAG." X# X############################################################################ X# X# X# How to use IBPAG X# ________________ X# X# X# I. Prerequisites X# X# The only prerequisites to using IBPAG are a knowledge of how to X# write a basic LR-parsable context-free grammar and a familiarity X# with the Icon programming language. IBPAG source files use X# essentially the same syntax as Icon source files. In a couple of X# cases, though the semantics of are quite different. IBPAG is X# essentially just a preprocessor. IBPAG-specific constructs, X# therefore, should be seen, not as Icon expressions, but as X# directives to the preprocessor, specifying how to construct the X# actual output code. X# X# In the sections that follow, I will discuss how to set up an X# IBPAG file, i.e. how to declare a start symbol and write rules. X# The issue of the tokenizer will then be addressed, along with a X# treatment of epsilon, and two reserved tokens: $ and error. X# Finally the overall compatibility of IBPAG and Icon code will be X# discussed, as well as how to start the IBPAG parser from the Icon X# code. If you still have problems and/or questions, try reading the X# UNIX manual page on YACC. IBPAG and YACC have some superficial X# similarities that may be helpful in sorting out why IBPAG works the X# way it does. Alternatively write to me, Richard Goerwitz, via X# e-mail at goer@midway.uchicago.edu. I'll be glad to field X# questions and provide anyone with the latest version I have X# on-hand. Doubtless there will be many bugs to find, and I'll be X# happy to fix any that get reported. At some point, the entire X# table generating process (in maketbls.icn) will need to be ripped X# out, and replaced with one of the new, faster LR(1) (or perhaps X# LALR) algorithms. X# X# X# II. The start symbol X# X# All context-free grammars have a start-symbol, i.e. a terminal X# that can be considered the "goal" of a parse. Because cycles (and X# bugs) in the grammar can make it hard for IBPAG to determine what X# the start symbol is, it must be declared. This is done with a X# start-symbol declaration, which has the following form: X# X# <start-symbol-declaration> ::= start_symbol identifier X# X# If no start-symbol declaration appears in an IBPAG source file, X# IBPAG assumes S as the start symbol. X# X# X# III. Rule declarations X# X# The heart of the IBPAG source file is the rule declaration. X# Basically, rule declarations tell IBPAG the structure of the grammar X# it is to assume when generating a parser. They also tell it what X# rules to associate with what Icon code. Rule declarations look a X# lot like procedure declarations, except that they begin with "rule" X# instead of "procedure" and must specify a priority and X# associativity. For those with formalistic leanings, the format of X# the rule declaration is as follows (opt = optional): X# X# <rule-declaration> ::= <rule-header> ... end X# <rule-header> ::= rule <priority> <associativity> \ X# <left-hand-symbol> ( <right-hand-symbol-list> ) X# <priority> ::= integer-literal (opt) X# <associativity> ::= left | right | none (opt) X# <left-hand-side> ::= identifier X# <right-hand-side> ::= <symbol-spec> | <symbol-spec>,<right-hand-side> X# <symbol-spec> ::= <symbol> | <symbol> \| <symbol-spec> X# <symbol> ::= identifier | string-literal X# X# The "..." indicates material that is precisely the same as for a X# standard Icon procedure declaration. The only difference between a X# rule and a procedure (syntactically, that is) is in the header. X# Superficially, rule and procedure headers look a lot alike. The X# rule headers, though, have extra elements (optional priority and X# associativity). They also interpret arguments in the option list X# quite differently than Icon does. In particular, these arguments X# are either strings, identifiers, or any combination of these X# separated by a bar. Strings correspond to terminals, identifiers X# to nonterminals. The bar will be discussed below. X# X# For those with a more practical bent, here is a sample rule X# definition: X# X# rule 1 none S(NP, VP) X# return ["S", arg1, arg2] X# end X# X# The 1 denotes the priority (here very low), while "none" specifies X# the associativity (can be right, left, or none). S is the X# left-hand side of the rule, and NP and VP are the (nonterminal) X# symbols that make up its right-hand side. Normally, the priority X# is not needed (the default is 1). In fact, it is best to try to X# write grammars without any priorities or associativity rules. The X# default associativity is none. X# X# Please note the body of the rule itself. The code there is X# triggered whenever an S is found (i.e. when a reduction via the X# above S-rule occurs during the parse). For those who know YACC, X# arg1 is equivalent to $1 and the return expression is equivalent to X# $$ =. Essentially, the above rule definition says that an S is X# composed of an NP and a VP, and that if an S is encountered, the X# parser is to create a list having three members, the first being X# the string "S", the second being the result that was returned when X# the NP was found, and the third being the result returned when the X# VP was encountered. Note that the above rule implies the existence X# of at least two other rules having NP and VP, respectively, as X# their left-hand sides. For example: X# X# rule NP("det", "noun") X# return ["NP", arg1, arg2] X# end X# X# rule VP("v", NP) X# return ["VP", arg1, arg2] X# end X# X# If two or more right-hand sides differ in just a single symbol, X# it is acceptable to combine them into one declaration using a X# vertical slash or bar. Although this bar looks like the usual Icon X# alternation operator, in this context IBPAG uses it as a macro X# device. In the following case, two rules are generated, each with X# one of the alternatives, "that" or "which," in its RHS. The same X# code is triggered by a reduction via either rule: X# X# rule RELCL("that"|"which", S) X# return ["RELCL", arg1, arg2] X# end X# X# X# IV. The tokenizer X# X# Note that in the above examples, "v," "det," "noun," "that", and X# "which" are considered terminal symbols. The user must, in fact, X# create a lexical analyzer called iparse_tokens that returns these X# symbols. The precise form in which iparse_tokens must function is X# as follows: X# X# iparse_tokens: file -> TOK records X# (stream) -> Ts X# Where stream is an open file, and Ts are TOK records. X# As a side-effect, iparse_tokens should increment the global X# variable line_number (once right away, then once for each X# newline encountered, or anything equivalent thereto) X# X# The -> notation above implies that iparse_tokens is a pure X# function. I just can't think of a good notation for generators. X# Iparse_tokens in fact *must be* a generator. TOK records contain X# the terminal symbol names in their first field. If the literal X# string value of the symbols is to be used by any of the rule code, X# it should be stored in the second field of the TOK record. A X# typical value returned might be: X# X# TOK("v", "eat") X# X# YACC/LEX's heterogenous yylval/return (TOKEN) strategy is really X# quite inelegant, and although my system isn't perfect, it avoids X# the global variable, and hence makes running simultaneous parsers a X# bit simpler undertaking (the only globals to worry about in IBPAG X# are line_number, errors, and fail_on_error, all of which can, if X# desired, be ignored). X# X# If you want line numbers to be output with any error messages X# that the parser might report, then make sure that the tokenizer X# keeps track of how many lines it has read by incrementing X# line_number. The parser itself automatically initializes this X# (global) variable to 0, so all you'll normally have to do is X# increment it by one for each newline the tokenizer encounters. If X# you leave line_number alone, the parser simply does not print out X# line numbers for errors it reports (which is somewhat less helpful, X# but workable nonetheless). X# X# X# V. Epsilon X# X# If needed, epsilon (i.e. the empty string) can be specified by X# an empty Icon string (e.g. rule REL(""|"that")). "" is a perfectly X# legal token. It is absolutely vital, however, to note that X# *epsilon is never pushed onto the value stack*. Hence in the above X# instance, if arg1 were to be accessed, an error would result if a X# reduction via REL -> epsilon (rather than by REL -> "that") X# occurred! For this reason it is wise to keep epsilon isolated X# (e.g. rule 10 REL("") and rule 11 REL("that"); note that giving X# both rules an associativity probably isn't necessary here, although X# giving the second rule the higher priority often *will* be). Note X# also that the tokenizer must never return an empty string as a X# token (i.e. TOK("")). Not only will the parser refuse to place it X# on the value stack, but it will not even recognize it as a valid X# lookahead symbol (since, by definition, it doesn't contain X# anything). X# X# X# VI. Reserved tokens ($ and error) X# X# Besides epsilon, the only string (qua symbol) the tokenizer X# cannot return at will is a TOK with either "error" or "$" as its X# first field (e.g. TOK("$")). The $ symbol denotes end-of-input, X# and *must* be returned by the tokenizer when the input stream ends X# (or when, for some other reason, the token stream has stopped). I X# could have cut the code so that failure or &null signalled the same X# thing, but I prefer to use failure of the tokenizer to signal an X# unexpected end of input, and I don't like the inconsistency of X# having &null be a valid token. The bottom line is that the X# tokenizer must return TOK("$") on end-of-input. You'll get used to X# it :-). X# X# The "error" terminal is reserved for error productions. If a X# syntax error occurs, and the token "error" exists in the grammar, X# the parser will attempt to back up as one state to see if any of X# them permits "error" as a valid lookahead symbol. If so, the X# parser jumps to the state that would have resulted if the parser X# had encountered "error" at that point. What this does is allow us X# to write error-recovery rules: X# X# global error_count X# rule EXPR("(", EXPR, ")") X# return arg1 X# end X# rule REGEXP("(", REGEXP, "error") X# write(&errout, "unmatched \"(\"; resynchronizing") X# # Pretend we got a right parenthesis, and continue X# # processing. X# return arg1 X# end X# X# So what happens if we get an expression without an enclosing right X# parenthesis? We get an error message, an increment of the error X# count, and a resynchronized parser that's ready to go (and possibly X# find more errors). This mechanism is very much like the mechanism X# YACC uses. X# X# Note that IBPAG will only pop one state off of the stack in X# efforts to find a state for which "error" is a legitimate lookahead X# token; any more than this, and we could easily back so far out the X# current production that we would end up causing confusion about X# just where the error occurred! If you write error productions, X# make sure that the error terminal is at most one token removed from X# the spot where you expect errors to occur. X# X# X# VII. IBPAG-Icon code compatibility (plus gotchas) X# X# Regular Icon procedures may be freely intermingled with IBPAG X# rules, so you can include iparse_tokens() right in the same file as X# your rule definitions. Note that rules, other than the header, in X# fact, behave somewhat like procedures in the sense that when a rule X# is triggered, the parser executes the code for that rule just the X# way code for a procedure is executed when it is invoked. The main X# difference between Icon procedures and rules is that the rules can X# only return one value or fail. If you want a rule to be able to X# produce more than one value, then have it return a coexpression. X# Failure in a rule signals that the entire RHS of the production X# should be discarded, and the stack should be returned to the state X# it was in before that RHS's first symbol was shifted onto the X# stack. Failure is really just another way of providing the user X# more control over error handling and recovery. If you have nothing X# useful to return, but do *not* want to signal an error, BE SURE TO X# RETURN A NULL DESCRIPTOR ("return &null")! X# X# X# VII. The parser itself X# X# In the output file, the parser that IBPAG constructs is called X# iparse(). You should invoke it somewhere to get the parser X# started. It takes a single argument: A stream (i.e. an open file). X# You could pass it a string, theoretically, instead of a file. X# Iparse() passes its first argument to iparse_tokens(), so be sure X# iparse_tokens() understands that it is working on a string instead X# of a file if you choose to do things this way. IBPAG's output is X# compressed somewhat, but the parser and error handler are human- X# readable. Please look at them and modify them as needed. Note X# especially the debugging facilities that are included with them X# (e.g. dump_lists()). X# X# X# VIII. Miscellaneous X# X# As it encounters errors, the parser, iparse(), increments the X# global variable errors by one. Normally one botched production X# will cause a series of cascading errors. As a result, the error X# handler only increments "errors" for the first in any unbroken X# series of errors. When (and if) the parser manages to X# resynchronize itself, and the cascade terminates, the errors X# variable is once again incremented normally, once for each error X# encountered. X# X# If no error processing is desired at all, you may set the X# global variable fail_on_error to something other than &null. This X# tells iparser's error handler simply to fail if there are any X# syntax errors. Note that "error" productions (described above, in X# the section on reserved tokens) will still work if they have been X# used in the grammar. So in fact, setting fail_on_errors will cause X# failure to occur for all errors that cannot be handled by X# user-defined error productions. What this does is bypass the X# built-in error detection and resynchronization routines. This may X# be useful if an IBPAG file is part of a much larger Icon program X# that needs to have errors registered internally, and not spit out X# separately to stderr by a renegade parser :-). X# X############################################################################ X# X# Below is a sample IBPAG source file. It's kind of a hack, but X# I'm sure it will adequately illustrate the basic principles: X# X# link radcon X# global ID_tbl X# X# procedure main() X# iparse(&input) X# end X# X# rule 0 left S(stream) X# return X# end X# X# rule 1 left stream(calc, stream) X# return X# end X# X# rule 1 left stream(calc) X# return X# end X# X# rule 2 none calc("CR") X# return X# end X# X# rule 3 left calc(expr, "CR") X# # Radcon can't handle reals, so this doesn't really work. X# line_number +:= 1 X# return write(radcon(arg1, \(\ID_tbl)["ibase"] | 10, X# \(\ID_tbl)["obase"] | 10)) X# end X# X# rule 4 none expr("ID") X# if not (return \ID_tbl[arg1]) X# then write(&errout, "uninitialized variable, line ", line_number) X# fail X# end X# X# rule 6 right expr("ID", "=", expr) X# initial ID_tbl := table() X# ID_tbl[arg1] := arg3 X# return arg3 X# end X# X# rule 10 left expr(expr, "+", expr) X# return arg1 + arg3 X# end X# X# rule 10 left expr(expr, "-", expr) X# return arg1 - arg3 X# end X# X# rule 11 left expr(expr, "*", expr) X# return arg1 * arg3 X# end X# X# rule 11 left expr(expr, "/", expr) X# return arg1 / arg3 X# end X# X# rule 11 left expr(expr, "%", expr) X# return arg1 % arg3 X# end X# X# rule 12 left expr(expr, "^", expr) X# return arg1 ^ arg3 X# end X# X# rule 50 none expr("(", expr, ")") X# return arg2 X# end X# X# rule 75 right expr("-", expr) X# return -arg2 X# end X# X# rule 75 right expr("+", expr) X# return +arg2 X# end X# X# rule 100 none expr("NUM") X# if find(".", arg1) X# then return real(arg1) X# else return integer(arg1) X# end X# X# # X# # iparse_tokens: file -> TOK records (a generator) X# # stream -> tokens X# # X# # Where file is an open input stream, and tokens are TOK X# # records holding both the token type and actual token text. X# # X# # TOK records contain two parts, a preterminal symbol (the first X# # "sym" field), and the actual text of the token ("str"). The X# # parser above only pays attention to the sym field, although the X# # user might well want to use the actual text. X# # X# procedure iparse_tokens(stream) X# X# local token, c, whitespace, operators X# #global line_number X# whitespace := '\r\t \n' X# operators := '+-*/^()=' X# X# token := "" X# every c := !(!&input || "\n") do { X# if not any(whitespace ++ operators, c) then { X# token ||:= c X# } else { X# if integer(token) X# then suspend TOK("NUM", "" ~== token) X# else suspend TOK("ID", "" ~== token) X# if any(operators, c) then X# suspend TOK(c) X# else { X# if c == "\n" then { X# line_number +:= 1 X# suspend TOK("CR"|"CR") X# } X# } X# token := "" X# } X# } X# if integer(token) X# then suspend TOK("NUM", "" ~== token) X# else suspend TOK("ID", "" ~== token) X# suspend TOK("CR"|"$") X# X# end X# X############################################################################ X# X# Links: options.icn X# X# See also: maketbls.icn, preproc.icn X# X############################################################################ X Xlink options, structs, ximage X Xglobal DEBUG, VERBOSE X# For other record declarations, see maketbls.icn. Xrecord RE(state, procname, sym, size) Xrecord SH(state) Xrecord AC() X Xprocedure main(a) X X local f, f2, lname, opttbl, usage, tlst, sym, seen, X act, elt, size, r, i, k X # global ptbl, alst, glst X X usage := "usage: ibpag [-d] [-v] [-f input-file]" X X opttbl := options(a, "dvf:") | stop(usage) X if DEBUG := \opttbl["d"] X then VERBOSE := 1 X else VERBOSE := opttbl["v"] X f := \opttbl["f"] | &input X f2 := &output X X ptbl := table() X makeptbl(f, f2) X X # set up (global) alst and glst (the action and goto lists) X CONST_TABLE() X X # X # Set up tlst (a set of all tokens used in the grammar), then X # turn the ACT records into RE, SH, or AC records. X # X tlst := set() X # squeeze a list of rules out of the action list; find the X # terminals in their right-hand sides X every sym := !\(\(!(!alst)).by_rule).RHS do X \sym.terminal & insert(tlst, sym.str) X insert(tlst, "$") X X # X # Clobber all of the ACT records, and turn them into RE, SH, or AC X # records. Get rid of structurally identical, structures by X # making them all point to the same structure. X # X if \VERBOSE then X write(&errout, "Merging duplicate structures in action list...") X seen := set() X every i := 1 to *alst do { X if /alst[i] then next X if *alst[i] = 0 then { X alst[i] := &null X next X } X every k := key(alst[i]) do { X if alst[i][k].str == "reduce" then { X r := alst[i][k].by_rule X size := *r.RHS X # Don't count epsilon moves when calculating sizes! X every elt := !r.RHS do X if elt.str == "" & \elt.terminal then size -:= 1 X act := RE(alst[i][k].state, r.procname, r.LHS, size) X } X # Don't need to store all this info for shifts or accepts. X else if alst[i][k].str == "shift" X then act := SH(alst[i][k].state) X else act := AC() X if alst[i][k] := equiv(act, !seen) X then next X else { X alst[i][k] := act X insert(seen, act) X } X } X # Simple optimization; if alst[i] is structured identically to X # a previously seen table, use the previously seen one! Note X # that equiv is more general than Equiv (it can accept sets X # and tables). See the IPL for equiv(). Equiv() is in X # debug.src. X if alst[i] := equiv(alst[i], !seen) X then next else insert(seen, alst[i]) X } X X write(f2, "") X write(f2, "############################################################################") X write(f2, "#") X write(f2, "# The following code has been inserted by IBPAG (Icon-based") X write(f2, "# Parser Generator). It contains a stack-based LR parser, an") X write(f2, "# error handler, and a debugging tool. The user must supply a") X write(f2, "# tokenizing routine called iparse_tokens().") X write(f2, "#") X write(f2, "# For a description of IBPAG source-file format, see the") X write(f2, "# documentation prepended to ibpag.icn.") X write(f2, "#") X write(f2, "############################################################################") X write(f2, "#") X write(f2, "# See also: ibpag.icn, maketbls.icn, preproc.icn, itokens.icn") X write(f2, "#") X write(f2, "############################################################################") X write(f2, "") X write(f2, "# uncomment for the compiler") X write(f2, "# invocable \"symbol\", \"TOK\", \"RE\", \"SH\", \"AC\"") X write(f2, "") X write(f2, "# I use ximage for debugging; remove it if you don't need it.") X write(f2, "link codeobj#, ximage") X write(f2, "") X write(f2, "record symbol(str, terminal)") X write(f2, "record TOK(sym, str)") X write(f2, "record RE(state, procname, sym, size)") X write(f2, "record SH(state)") X write(f2, "record AC()") X write(f2, "") X write(f2, "global line_number, errors, fail_on_error") X write(f2, "") X write(f2, "#") X write(f2, "# iparse: file -> ?") X write(f2, "# stream -> ?") X write(f2, "#") X write(f2, "# Where stream is an open file, and ? represents the user-defined") X write(f2, "# result of a completed parse of file, from the current location") X write(f2, "# up to the point where the parser executes an \"accept\" action.") X write(f2, "#") X write(f2, "# The second to fifth arguments are used on recursive calls from") X write(f2, "# the error handler, iparse_error. Ignore them, unless you are") X write(f2, "# sure of what you are doing!") X write(f2, "#") X write(f2, "procedure iparse(stream, state_stack, value_stack, next_token, err_state)") X write(f2, "") X write(f2, " local start_symbol, token, act, last_symbol, arglist") X write(f2, " static alst, glst") X write(f2, " #global line_number, errors") X write(f2, " initial {") X every lname := "alst" |"glst" do { X encode(variable(lname)) ? { X writes(f2, "\t", lname, " := decode(\"") X if write(f2, move(47), "_") then { X while write(f2, "\t ",move(60), "_") X write(f2, "\t ", tab(0), "\")") X } X else write(f2, tab(0), "\")") X } X } X write(f2, "\t#") X write(f2, "\t# Uncomment the following if you want a look at the state and goto") X write(f2, "\t# tables. If you aren't planning on looking at them, find the") X write(f2, "\t# procedure definition for dump_lists below, and delete it. Why") X write(f2, "\t# keep it around if it's not being used?") X write(f2, "\t#") X write(f2, "\t# dump_lists(&errout, alst, glst)") X write(f2, " }") X write(f2, " #") X write(f2, " # New, not recursive, invocation; reset stacks, line number and") X write(f2, " # error count.") X write(f2, " #") X write(f2, " start_symbol := ", image(start_symbol)) X write(f2, " /err_state := 0") X write(f2, " /state_stack := [1] & line_number := 0 & errors := 0") X write(f2, " /value_stack := []") X write(f2, " /next_token := create iparse_tokens(stream)") X write(f2, "") X write(f2, " # &trace := -1") X write(f2, "") X write(f2, " while token := @next_token do {") X write(f2, "\trepeat {") X write(f2, "\t act := \\alst[state_stack[1]][token.sym] | {") X write(f2, "\t\t#") X write(f2, "\t\t# You can replace this error handler with whatever you") X write(f2, "\t\t# like, and have it do whatever you like.") X write(f2, "\t\t#") X write(f2, "\t\treturn iparse_error(alst, state_stack, value_stack,") X write(f2, "\t\t\t\t token, next_token, err_state + 1)") X write(f2, "\t }") X write(f2, "") X write(f2, "\t # If we're not working on an error production, assume that") X write(f2, "\t # we have a fully (re)synchronized parser, & clear err_state. ") X write(f2, "\t #") X write(f2, "\t (\\last_symbol | token.sym) == \"error\" | { err_state := 0 }") X write(f2, "\t last_symbol := token.sym") X write(f2, "") X write(f2, "\t case type(act) of {") X write(f2, "\t \"SH\" : {") X write(f2, "\t\t # push the next state onto the state stack") X write(f2, "\t \t push(state_stack, act.state)") X write(f2, "\t\t # push the current token's literal value onto the") X write(f2, "\t\t # value stack") X write(f2, "\t\t push(value_stack, token.str)") X write(f2, "\t\t # break out of enclosing repeat loop") X write(f2, "\t\t break") X write(f2, "\t }") X write(f2, "\t \"RE\" : {") X write(f2, "\t \t arglist := []") X write(f2, "\t\t #") X write(f2, "\t\t # Pop as many elements off of the token stack as") X write(f2, "\t\t # there are symbols in the right-hand side of the") X write(f2, "\t\t # rule. Push these elements onto an argument list.") X write(f2, "\t\t #") X write(f2, "\t\t every 1 to act.size do {") X write(f2, "\t\t pop(state_stack)") X write(f2, "\t\t push(arglist, pop(value_stack))") X write(f2, "\t\t }") X write(f2, "\t\t #") X write(f2, "\t\t # Check to goto list to see what state we should") SHAR_EOF true || echo 'restore of ibpag.icn failed' fi echo 'End of part 1' echo 'File ibpag.icn is continued in part 2' echo 2 > _shar_seq_.tmp exit 0 -- -Richard L. Goerwitz goer%midway@uchicago.bitnet goer@midway.uchicago.edu rutgers!oddjob!ellis!goer